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■ Abstract 

The purpose of this article is to obtain a better understanding of the 
• extended variational principle (EVP). The EVP is a formula for the ther- 

| modynamic pressure of a statistical mechanical system as a limit of a 

sequence of minimization problems. It was developed for disordered mean- 
field spin systems, spin systems where the underlying Hamiltonian is itself 
random, and whose distribution is permutation invariant. We present the 
EVP in the simpler setting of classical mean-field spin systems, where 
the Hamiltonian is non-random and symmetric. The EVP essentially 
solves these models. We compare the EVP with another method for 
mean-field spin systems: the self-consistent mean-field equations. The 
two approaches lead to dual convex optimization problems. This is a new 
connection, and it permits a generalization of the EVP. 
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1 Introduction 

The extended variational principle (EVP) was introduced in 0, by Aizen- 
man, Sims, and one of the present authors. It was applied to a mean-field 
disordered spin system, known as the Sherrington-Kirkpatrick spin glass. 
This is an Ising spin system, whose underlying Hamiltonian is random, 
such that the joint distribution of the coupling constants is permutation 
symmetric. The purpose of the EVP there was to give a variational formu- 
lation of the pressure, different than the usual Gibbs variational principle 
(GVP). For spin glasses, it seems that the GVP does not yield a useful 
characterization of the pressure because of the complicated dependence of 
that formula on the random coupling constants. 

The EVP was used to re-derive upper bounds on the quenched pressure 
originally proved by Guerra in |13|. Also, the proof in |2] helps to unify 
that bound with the earlier proof of existence of the quenched pressure 
by Guerra and Toninelli |14| . Moreover, the approach of 2, introduced 
the new concept of "random overlap structures" of which, Ruelle's ran- 
dom probability cascade (RPC) |23| seems to give distinguished examples, 
having certain invariance properties. On the other hand, the sequence of 
variational formulas that comprise the EVP are still difficult to work with. 
For example, the Euler-Lagrange equations were not derived. 

Shortly after the preprint for [5], Talagrand announced a proof of the 
most interesting problem related to the Sherrington-Kirkpatrick model, 
namely "Parisi's ansatz". (C.f., Talagrand's paper 131 )| and his book |31|.1 
This does not diminish interest in the EVP and its relation to spin-glasses. 
There is hope that the new insight which could be gained by finding a 
proof of Parisi's ansatz based on the EVP and random overlap structures 
would lead to more general results. 

Since the Euler-Lagrange equations are so hard to determine for mean- 
field spin glasses, it seems like a good idea to consider mean-field classical 
spin systems, where the situation is easier. These are Ising-type spin sys- 
tems (and generalized versions) where the Hamiltonian is non-random, 
and permutation symmetric. It turns out that for such systems, not 
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only can the Euler-Lagrange equations be derived, they can be essentially 
"solved" . 

For the spin systems just described, there is another method of solu- 
tion, called the "self-consistent mean-field equations" . It consists of solv- 
ing an implicit, self-consistency equation for a 1-body measure. One way 
to quickly derive the implicit formula is to write down the GVP. The GVP 
requires one to optimize a certain function over the set of all permutation- 
invariant iV-body measures. But instead one optimizes just over the re- 
stricted manifold of iV-body product measures. The Euler-Lagrange equa- 
tion for the GVP on this restricted set gives the self-consistent mean-field 
equation. Often one cannot explicitly solve this 1-body problem, but the 
mere fact that it reduces an iV-body problem to a 1-body problem justifies 
calling this a "solution". The solution obtained by the EVP is similar in 
that it also reduces the iV-body problem to a 1-body problem. 

In the course of our research, we were led to the beautiful and concise 
paper of Fannes, Spohn and Verbeure which treats mean-field quantum 
spin systems and gives a rigorous justification of the the self-consistent 
mean-field equations. By specialization, their results also apply to clas- 
sical spin systems. In the classical case 2 , their method uses the Gibbs 
variational formula, combined with de Finetti's theorem. We will call this 
the Gibbs, de Finetti principle (GdFP), henceforth. 

The de Finetti theorem says the following. Consider a countable num- 
ber of spins, indexed by sites of N, say. Then the measure on f2 N is 
called "exchangeable" if it is permutation invariant, for permutations of 
the arguments which fix all but a finite number of them. The limit Gibbs 
measures all have this property by virtue of the underlying symmetry of 
the Hamiltonians. The de Finetti theorem says that the most general ex- 
changeable measure is a mixture of i.i.d., product measures on the spins 
indexed by N. With further work, one can restrict attention to the ex- 
treme measures, which are i.i.d., product measures (so that the mixture 
is trivial). 

One of the goals of our paper is to compare the EVP to the GdFP. 
Before describing the comparison, let us mention two other useful ap- 
proaches to solving mean-field spin systems, which we will not discuss 
in this paper. One approach is the "coherent states approach", which is 
useful for quantum spin systems. This was worked out by Lieb in |18| 
for the large-spin limit of the Heisenberg model, and was also applied to 
the Dicke Maser model by Hepp and Lieb in |15|. In fact, it seems to be 
Hepp and Lieb's work on the Dicke Maser model which motivated Fannes, 
Spohn and Verbeure. The other approach uses large deviation estimates. 
A good reference is Ellis and Newman's paper on the Curie- Weiss model, 
[7j. An advantage of I10| is its generality. 

Coming back to the comparison of the EVP and the GdFP, let us 
say that both give the same information, when they work. This leads 
one to expect that there may be a more direct link between the two 

1 We are very grateful to B. Nachtergaele for bringing this paper to our attention. 

2 In the quantum case, their method uses a generalization of de Finetti's theorem by St0rmer 
|28l . and an alternative to the Gibbs formulation suitable for quantum spin systems by two 
of those authors 151 151 . 
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approaches. Indeed there is. It is simplest to see in the 2-body case, when 
the interaction defines a convex bilinear form on measures. Then the two 
problems can be viewed as dual optimization problems in the sense of 
convex variational analysis. More precisely, there is a joint "Lagrangian" 
which is a concave-convex function of two variables. Maximizing over the 
concave variable gives the nonlinear function which one needs to minimize 
in the EVP. Minimizing over the convex variable yields the nonlinear 
function which one needs to maximize in the GdFP. So the fact that both 
methods lead to the same quantity - the thermodynamic pressure - is a 
consequence of the fact that the max-min of the joint Lagrangian equals 
the min-max. 

In case the interaction is continuous and bounded, it is trivial to see 
that the min-max and the max-min are equivalent, even in the non-convex 
case, and for n-body interactions with n > 2. But for singular interac- 
tions, the equality is nontrivial. Nevertheless, it is true, and follows from 
a theorem called the Kneser, Fan theorem. This theorem is a generaliza- 
tion of the famous von Neumann minimax theorem. This allows one to 
generalize the extended variational principle to some models with singular 
interactions (e.g., Coulomb repulsions). 

As the reader will see, the EVP is easy to understand and prove, 
because it only uses estimates based on convexity and Jensen's inequality. 
In comparison, to prove the GdFP one must use properties of relative 
entropy, as well as the de Finetti theorem. Therefore, the latter is more 
complicated than the former. On the other hand, the GdFP is more 
robust. 

In conclusion, we would like to make one extrapolation to spin glasses, 
which is the following: it would be useful to have an analogue of de 
Finetti's theorem, suitable for spin glasses. By this we mean an intrinsic 
characterization of the limiting measures in spin-glasses in terms of an 
invariancc principle with respect to some stochastic dynamics. (Note that 
the proof of de Finetti's theorem 1101 actually characterizes the measures 
on f2 N which are invariant under the shift on N.) So far, there is one 
result in this direction. It is the recent, interesting paper by Aizenman 
and Ruzmaikina, pQ, which characterizes the 1-level replica-symmetry- 
breaking RPC's by an invariance principle called "quasi-stationarity" . 

The layout of this paper is as follows. In Section|5|we give the definition 
of what we mean by mean-field spin system. In Sectional we give the main 
results related to the EVP. In Section 2] we determine the optimizers of 
the EVP, and use this to give a simpler formula for the pressure. We 
also state a generalization which we prove later, for singular interactions. 
In Section |S] we recall the main results of the GdFP, as proved in |1(J| 
(specialized to classical spin systems). In Section |S| we construct a joint 
Lagrangian for the EVP and GdFP. We also prove the generalization of 
the EVP from Section 2] In Sectional we give the simplest example. 

2 Mean-Field Spin Systems : Definition 

In this section, we define the notation and set-up for a "mean-field spin 
system". For us a mean-field spin system is defined by a quadruple 
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(fi, a, n, <f>) where: Q is a compact metric space; a is a distinguished 
Borel probability measure on Q called the a priori measure; n is a pos- 
itive integer determining the number of bodies in the interaction; and 
cj> : Q n —* E U {+00} is the n-body interaction. It is useful that fl is a 
compact metric space, and that a is a Borel probability measure. (For 
example, this means that a is regular.) This is the level of generality one 
will find for classical spin systems in |17| and |26| . 

We denote the set of all Borel probability measures on S7 by Mf(£l) so 
that a € M~[ (fi). We will assume that is a Borel measurable function 
and that it is bounded from below. Furthermore, we assume that a and 
<f> are compatible in the sense that a®"((/>) < 00. (Henceforth, whenever 
\x is a measure on a cr-algebra and / is a measurable function on the same 
CT-algebra, we write 11(f) for the integral of / against /i. We also write //i 
for the [possibly signed] measure such that (ffi)(A) = m(/Xa)- We use 
tensor notation to denote product measures.) 

We will assume that (j> is symmetric on £l n with respect to the natural 
action of the symmetric group 6„, as fits with our intention of studying a 
mean-field system. For each N > n, we define a Hamiltonian, Hn '■ Q N — * 
R U {+00}. For x = (aci, . . . ,x N ) £ tl N , 

5^ <t>(%i(l), ■ ■ ■ , x i(n) ) . (1) 

l<i(l)<---<i(n)<AT 

Note that Hn is symmetric with respect to the natural action of &n- 
Equivalently, we can think of the underlying lattice as begin a complete 
graph. 

For each N > n, the partition function is the number 

Z(N) := [ exp(-H N (x)) da® N (x) 

and the finite approximation to the pressure is 

p(N) := N' 1 log Z(N) . 
The thermodynamic pressure is defined as the limit 

p* := lim p(N) , 

if it exists. We are primarily interested in the thermodynamic pressure. 
Later we will recall a well-known result (Theorem 1121 which guarantees 
that the limit does always exist. 

We have eliminated the inverse-temperature parameter /3, by absorb- 
ing it into the Hamiltonian. It will be fixed and finite for our entire 
discussion. 

3 The Extended Variational Principle 
3.1 Setup 

The extended variational principle is a method for calculating the pressure 
of a family of Hamiltonians (Hn '■ N) which are close to (Hn '■ N). In 



H N (x) := N 
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this section, we will assume that <f> is a bounded function; i.e., we assume 
that it is bounded below, in addition to being bounded above, as in the 
general set-up. With this assumption, the new Hamiltonians will be so 
close to the old ones that the thermodynamic pressures will be equal (as 
we will show). 

Let us define a function, $ : Mf(£Y) — > R by 

*00 := ^"(<A) • 

Now, for each N G N+, we may define a new Hamiltonian Hn ■ 12^ — » R 

as 

H N (x) := N<f>(n x ), (2) 
where for x = (xi, . . . ,xn) £ Cl N , 

N 

Hx := N^ 1 ^2 $xi ■ 

i=l 

The measure fi x is called the empirical measure of the point x. 

Note that, in the important case that n — 2, the main difference be- 
tween Hn and Hn is the appearance of self-interaction terms 4>{xi, Xi), for 
i = l,..., N. One intuitively expects that these terms make a small con- 
tribution, since there are only N of them, compared to the total number 
of terms N(N — l)/2. However, if <f> would have an infinite repulsion, so 
that <f)(xi,X2) = +oo whenever xi = xi, this would lead to a Hamiltonian 
Hn = +oo, entirely dominated by the self-interaction terms. This is why 
we must assume that 4> is bounded above, as well as below. Of course, this 
is a strong requirement, excluding many physically interesting examples. 
(In the case n > 2, the extra terms in Hn are the natural generalizations 
of these, where two or more of the indices coincide.) 

We define 

Z(N) := J exp (-#jv(a:)) da® N (x) . 

We define 

p(N) = 1 log Z(N) , 

and we define p* as 

P* ■■= Jim P(N), 

if it exists. 

There is one more important condition which we put on <j). We assume 
that <f> satisfies the necessary conditions so that <3> : Mi(Q) — > R is either 
convex or concave. It makes sense to speak of convexity or concavity of $ 
because Mf(SY) is a convex set. 



3.1.1 Equivalence of Thermodynamic Pressure 



We will now show the relationship between p and p, under the assump- 
tion that (f> is bounded. To begin, we observe that the energy densities 
N~ x Hn(x) and N~ x Hn{x) are close, in fact 



H N (x) H n (x) 



N 



N 



n(n - 



N 



(3) 
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Indeed, this follows because 

H N (x) 
N 



where the indices 1(1), . . . , I(ri) are i.i.d. random variables, which are 
uniform on {1, . . . , N}, and 

= E[(j>(x I(1) , . . .,Zj( n )) 1 1(1), . . -,I(n) are distinct] . 

Therefore, 



H N (x) Hn(x) 



< 2 H^Hoo P{J(1), . . . , I(n) are not distinct} . 



N N 
Then follows by bounding the probability, 

T({I(l),...,I(n) are not distinct}) < ^ F{I (j) = I (k)} . 

l<j<k<n 

Now we use an elementary inequality to bound the difference in pjv and 
Pm, starting from But since we will use the same bound repeatedly 
hereafter, we will state it in some some generality. 

Suppose X is a compact metric space, and 9 £ M^(X) is a Borel 
probability measure on X. Define a function, on the set of Borel 
measurable functions / : X — > R U {±00} , as 

*(/) := log 9(e f ) . 

Then we have the following. 

< |l/-ff||oc. (4) 

Indeed, one sees that 

6(e 9 ) < \\e 9 - f \\8(e f ), 

which proves [\I/(<?) — ^K/)] < ||<? — /II- The other inequality follows 
symmetrically. 

Using equation 0, we see that 

\p(N)-p(N)\ < ^H_H mca . 
In particular it implies the following. 

Corollary 1. Under the assumption that ||<^||oo < 00, the thermodynamic 
pressures p, and p* either both exist, or both do not exist, together. In 
case they both exist, they are equal. 
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3.2 Results 



In the bulk of this section we assume that $ is convex. In Subsection 
13.2.11 we will state what changes when <J? is concave. 
The first main result is the following important fact. 

Theorem 2. The sequence (N p(N) : N £ N+) is superadditive. That 
is, for every pair Ni,N 2 £ N+, 

(JVi + iV 2 ) p(Ni + N 2 ) > JVi p(Ni) + N 2 p(N 2 ) . (5) 

Moreover the sequence, (p(N) : N £ N) converges in R. 
Remark: Compare to the main theorem in \14\ - Also compare to 

It is a well-known fact that for a superadditive sequence (X(N) : N £ 
N+), the limit of N~ X(N) exists, although possibly equal to +oo. (See 
the origianl by Fekete or problem #98 of Polya and Szego |2U| [for 
which there are English translations].) Therefore, the importance of the 
second part of the theorem is that the limit is not +oo. 

The second main result is a variational formula for p*. To set this 
up, we require some definitions. We first note that, since Q is a compact 
metric space, the weak topology on is compact and metrizable. 

(C.f., [3T] Section IV.4 and Section V.5.) Thinking of Mf{Q) as a 
compact metric space, we define J\Af(Aif(Q)) as the set of Borel proba- 
bility measures on it, which is also compact and metrizable with the weak 
topology. We also define M~^(Mf(ft)) to be the set of all (positive) Borel 
measures, p, such that 

< p{Mt(ty) < oo. 

This is a cone whose base is the Choquet simplex Mf(Mf (fi))- 

The main idea behind the extended variational principle is a physical 
notion called the cavity step. Following the prescription in [5], we will 
define a sequence of functions, which we call the cavity field functions. 
There is a different cavity field function for each N £ N+ corresponding 
to adding N extra particles to a system, whose size is supposed to be 
much larger than N. 

If the original system is large enough, then instead of considering a 
configuration in fl M for some large M, we instead consider a measure 
in J\Af(fl). Note that because the Hamiltonian is permutation invari- 
ant, we only ever need to consider configurations in Q, M modulo permuta- 
tions. But using the empirical measures one can embed the quotient space 
S1 m /6m into M*(Q) for each M. Since M^(£i) contains each of these 
finite configurations spaces, it does make sense to consider a large system 
size by replacing configurations in some Q M by measures in M~x(£l). 
It is useful to define : Mf(Q.) x A4 + (fi) -> R as 

$ (1 V,/i) = nlis®"- 1 ® n]{<f>) . 
This is the directional derivative of ^(f) in the direction p. I.e., 

±$(u + tp)\ =$ (1 W)- 
at I t=o 
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For each TV e N+, we define two functions from M~^(Mf(Q)) to R. These 
are 

G$(p) := TV" 1 log / / expUv$ (1) (^)l da 9ir (x) dp{v) ; 

and 

(p) := TV" 1 log / exp (-N {y, v) - ) . 

JMf(n) K 1 J/ 

We define the cavity field function (for addition of TV particles to a large 
system) as 

G N (p) := G%\p)-G%\p). (6) 
This function is homogeneous of degree-0. This means that 

\tp£M + f {Mi(£i)), \/t e (0,oo) : G N (p) = G N (tp) . 

This fact is obvious because scaling by t simply adds the same constant 
to each of G$ (p) and &^(p), which cancels in the difference. 

For every measure p £ A4^(A4^(^l)), there exists a t 6 (0, oo) such 
that tp is actually a probability measure. Therefore, we could restrict 
attention to Mf(Mf(Q.)). But it is sometimes useful to be free of the 
constraint that all measures should be normalized. One easily sees that 
Gjv is bounded on Mf(Mf (Q.)) using equation Therefore, using ho- 
mogeneity, it is bounded on A4^(Mf(fl)). Moreover, using the monotone 
class theorem, and the fact that $ is Borel-measurable, one can check that 
Gjv is Borel-measurable (c.f., Section 1.3). If <j> is continuous, then $ 
is continuous, and it is clear that then Gjv is also continuous. 

The main theorem for this section is the following characterization of 
the pressure. 

Theorem 3 (EVP). For each N e N+, 

p(N) < infGiv(p), (7) 

p 

where the infimum is taken over p £ M^(M±(Q.)). Moreover, 

p* = p* = lim infGjv(pjv), (8) 

iV— »OG Pjv 

where, for each N £ N, we infimize over pjv G A4~j(Mf(£l)) separately. 
Remark: Compare to the main theorem in f^f. 

We will prove this theorem, as well as Theorem |5J in the next section. 
First we state what changes if $ is concave instead of convex. 

3.2.1 Changes for the concave case 

In the concave case the sequence of finite approximations to the pressure 
is subadditive instead of superadditive, so that the inequality in $5$ , from 
Theorem |21 is reversed. In the extended variational principle, Theorem 
El the inequality of @ is reversed, and the infimum is replaced by the 
supremum. The identity in (|SJ still holds, but with the infimum replaced 
by the supremum. 
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3.3 Proofs 

All proofs are exactly symmetric between the convex and concave cases 
for <!>. So we will only give proofs of the convex case. 

Proof of Theorem 4.1. One needs to show Z(M + TV) > Z(M) Z(N) 
for every M, TV G N+. I.e., 

/ eM-HM+N{z))da® M+N {z) 
Jqm+n 

> f eM-[HN(x) + H M (y)])da' E,N (x)da' SM (y). 

Jn M xn N 

Rewriting z — (x,y), this inequality follows by proving 
H M +N((x,y)) < H N (x) + H M (y) , 
which is equivalent to 

(M + ^ N*(j*m) + MH^y) , (9) 

using the definition (|5J. But P( x . y ) is a convex combination 

(M + TV) ■ n {Xiy) = TV ■ \i x + M ■ fly . 

So JUJ follows from convexity of $ and Jensen's inequality. 

It is a well-known fact that for superadditive sequences, (Jf (TV) : TV £ 
N), the limit of N~ 1 X(N) exists, although possibly equal to +oo. See 
or see Lemma 2] below. Therefore, (p(TV) : TV 6 N) converges in 
R U {+oo}. But there are obvious upper bounds which rule out the limit 
+oo, namely p(TV) < ||0|]oo- ■ 

The first half Theorem 01 is easy to prove. We only need to use con- 
vexity. 

Proof of Theorem |3J Equation (0). It suffices to show that 

p(N)+Gf(p) < G$(p), 
for every p € A4^(Mf(fl)). Direct calculation yields 

p(N) + G%>(p) 

= TV" 1 log [ [ exp(-N f(v,x)) dp{v) da® N {x) , 

where 

f(u,x) = §{p x )-${v) + $ m {v,v). 

Similarly, 

Gn\p) = TV -1 log I [ e X p(-N$ w (v,p x )) dp(v) da® N (x) . 

■I M +A (a) Jn N v ' 

Therefore, the inequality holds by showing that 

- $(«/) - {v, p, x - v) > . (10) 
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But one easily checks that for < t < 1 

-rMt • Hx + (1 - 1) ■ v) = $ (1) (i ■n x + (l-t)-v;fi x -v). 
at 

Using this, l|10|l is a standard consequence of convexity of $. ■ 

For the proof of the second half of Theorem we will rely on the 
following lemma. Although it is well-known that the limit N~ 1 X(N) 
exists when (X(N) : N G N+) is a superadditive sequence, there is 
another simple fact which is not as well-known, but which is essential to 
the extended variational principle. This was used, notably, in We 
repeat the proof here, for completeness. 

Lemma 4. Let (X(N) : N G N) be a superadditive sequence. Then 
lim ^1 = Urn limhf^ + AQ-XW 

iV-»oo N N^oo M— too N 

Proof of Lemma |U For M, N G N+, define 

Y(M,N) := ^(M + N)-X(M) 

and Y(N) := lim infjv/^oo Y(M, TV). By superadditivity, Y(M,N) > 
N~ 1 X(N) for all M G N + , therefore 

Y(N) > N^ 1 X(N) . (11) 

Suppose that k, M, N G N+ and that r > M. Then by a telescoping sum 

1 

Y (r, kN) = - V Y(r + jN, N) > inf Y(M',N). (12) 

fc ^— ' M'>M 
3=0 

Given M, TV G N+, define N+-valued functions ft,r : [M + /V, oo) -> N, 
which are uniquely specified 3 by the requirements n = k(n)N + r(n) with 
the remainder in the range r(n) G [M, M + N — 1]. Then one can easily 
see 

liminf — ^—i- = lim ml — — r"T^^ = hmmf Y(r(n), k(n)N) . 

n^oo n n — * oo n — V[n) n — 'oo 

Using (11 Lift . this implies 

liminf = liminf min Y(r,kN) > inf Y(M',N). 

rwoo n fc^oo r£[M,M + N-l] M'>M 

Taking the monotone limit in M, we obtain 



Therefore, by (ITTI 



liminf > Y(N). 



liminf^ >Y(N)>^1 

n— >oo n N 



5 In Matlab language, k(n) = div(n — M, N) and r(n) = M + mod(n — M, N). 
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Taking the limsup in N shows that (N~ 1 X(N) : N G N+) converges. 
Then by the sandwich theorem, (Y(N) : N G N+) also converges, to the 
same limit. ■ 

It seems that much of the physicists' so-called cavity step is encoded 
in Lemma 01 Using it, we now complete the proof of Theorem |3 

Proof of Theorem QjJ Equation ©. By 0, 

p, = liminf p(N) < liminf inf Gn(pn) ■ 

N — >oo iV— >oo pjv 

Therefore, JHJ follows if one can prove 

p* > lim sup inf Gn(pn) ■ (13) 

For each N G N suppose there were a sequence of measures (pjv : M G 
N+) such that 



lim 

M^oo 



0, 



and 



lim 

M— too 



Then it would follow 



lim 

M — >oo 



= 0. 



(14) 
(15) 



Taking (pfj : M G N+) as a variational sequence, this would imply 
^. ni (M + N )m + N)-Mp { M) > u§ 

Af—oo AT " P» 

But by Lemmag] applied to X(iV) = N p(N), this would give fT3"l. There- 
fore, it only remains to prove If 14^1 and l|15[l . 

The map i/ i— > /i a is a continuous function from Q M to .M^fi) (with 
respect to the weak topology on the target). Therefore, given any Borel 
measure p G Mt(Cl ), there is a unique measure p G M.^ (Mf (Q,)) , 
called the push-forward, such that 

/ F(ridp(p) = / F(^ y )dp(y), 

JM +t i(n) Jn M 



for every continuous function F : Mi(SY) —* K (continuous with respect 
to the weak topology). 

Now consider a measure j$ G M~^(fl M ), absolutely continuous with 



respect to a 



such that 



da ®M 



(;/) 



exp 



M 
' M + N 



H M (y) 
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Let pjSf € M~j(Mi(yi)) be the push- forward of pjSf. Then one verifies 

G$Q>%) = AT 1 log / / e X p(-iV$ (1) ( M ,,^)) da® N (x)dp%(y) 
Jn M Jn N v 7 

= AT" 1 log / e - /( *' a) da^(:r)da® M (2/), (16) 

Jn M xn N 

where 

One can prove (1141 for this sequence of measures. 

One can write a formula analogous to equation 1161 for p(M + N), 
namely 

M + N _,„„ 
_^_p(M + iV) 

= AT 1 log / exp (— (M + JV) da® M+JV (z) . 

Using the decomposition z = (x,y), this yields 

^-pCM + iV) = AT" 1 log / e _fl( *' v) da®*(a:)da® M (iO, 



A/ 



(17) 

where g{x,y) = (M + N) $(^ (a . iW) ). 
Using the formula 

N^+Mfly N 

= M + N =t * v + WTN ^ " "» ] ' 

one writes 

g(x,y) ~ f(x,y) 
M + N 

= * ( My + A/ + AT ^ ~ ®^ ~ M + AT $(1> (^'^ 

A?" 2 

+ (M + 7V) 2 
Now, by Taylor's theorem, 

Af 2 d 2 



(Af + TV) 2 y 
and one easily calculates 



/ (* ~ *) 4a -Mv]) I NB dB, 



5* + 



) / / 4>(x)dv® n 2 (xi, . . . ,x n - 2 ) d^i 02 (x„- 1 ,x n ) . 

I j Jfln-2 J Q 2 
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Therefore, 



1/ - ffllsup < 



M + N 

By equation JIJ, and equations 1161 and 1171 1. this means 



G«(^)-^±^p(M + jV) 



This certainly does converge to zero as M — > oo, proving The 
argument for 11511 is similar, and is left to the reader. ■ 



N 






M(")] 


~ M + N 





4 Optimizers for the EVP 

Proposition 5. Suppose that (f> is continuous, and $ is convex or con- 
cave. Then, for each N , the optimum of Gn is attained. Moreover, the 
normalized optimizers (scaled to be probability measures) form a union of 
faces of the Choquet simplex Mf(Mi(Q)). 

Proof. Restrict attention to the case that $ is convex, since the con- 
cave case is proved symmetrically. Note that Gn is continuous. Since 
A4i(M^({l)) is compact, the minimum is attained. This proves the first 
part of the proposition. The second part of the proposition is equivalent 
to the following statement: let p be any minimizer in Mf(M^(£l)) then 
for each v £ supp(p), the measure 5 V is also a minimizer. 

Let / be any Borel measurable function with < / < 1 such that 
p(f) > 0. For < t < 1, define p t G M}(Mt(Q.)) by 

:= (1 + f/GO). 

Then, for i = 1,2 

G«( Pt ) = log[e X p(G^))+ie X p(G«(p))] , 

where p £ M~f(Mf(Q)) is the measure p = / p, using an obvious nota- 
tion. 

The two functions, t G N \pt), for i = 1, 2, are obviously differen- 
tiable on ( — 1, oo). Therefore by criticality, 

![flS>w-aPw]L,-o. 

But careful consideration of this equation yields 

G N (p) = G N (p). 

So p is another optimizer. 

Now, for any v 6 supp(p) consider the sequence of functions f e = 
XD(u-e) (f° r e > 0) where \ is the indicator and D(y; e) is the closed ball, 
with reference to any metric on Mf(£l) which yields the weak topology. 
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(Such a metric is guaranteed to exist since SI is compact and hence separa- 
ble. C.f., Section V.5.) Since v £ supp(p), one knows p(/ e ) > for all 
e > 0. The family of rescaled measures p(f c )~ 1 ft p converge weakly to 8 V 
in the e j limit. Using continuity of Gn, that means 8 V is a minimizer, 
as claimed. ■ 

When p has the simple form S v for some v £ Mi (SI), all the values 
Gn(Sv) (for N £ N+) are identical, and are given by the function g(v) 
written below. Therefore, the limit in Theorem |3] equation JSJ is trivial. 
We state this as the following: 
Corollary 6. Define g : Mi (SI) -> R by 

g(u) = (n - + log j cxp (i/; 5,)) da(aj) . (18) 

Suppose that 4> is continuous and $ is convex. Then 
p* — p, — laingM. 

(If $ is concave instead of convex, the minimum changes to the maxi- 
mum.) 

4.1 Extension for Convex Two-Body Interactions 

Suppose we drop the restriction that (j> is bounded, and only require that 
cj> : Q n —* R U {+00} is Borel measurable and bounded below, as in 
Section |21 In this case pn may no longer exist (or rather it may equal 
+00, identically) for each iV £ N+. But the cavity field function Gn is 
still well-defined and finite, if we put certain natural restrictions on the 
measures p which we use. The same is true for its restriction to extreme 
points, defined by g. It is reasonable to ask if one can still determine p, 
(which may now be inequivalent to p*) using g? At least in some cases 
the answer is, "y es "- 

Theorem 7. Suppose n = 2 and $ is convex. For each C > 0, define 
Mi (SI, a,C) = lu £ Mi(Sl) : u < a and 

Then 

p, = lim inf g(u) . 

Remarks: 1. Restricting to Mi(Q,,a,C) is a technical necessity. If we 
do not put some restrictions on v g Mi (SI), then it is possible that the 
two summands in 118(1 are +00 and —00. On the other hand, because 
<E>(q) < 00, both terms are finite when v £ Mi (Si, a, C) for some C. By 
taking the C — > 00 limit, at the end, we relax these restrictions. This is 
also the condition that we need in Section to apply the Kneser, Fan 
Theorem. 

2. Our proof uses convexity 0/$. It does not give the analogous statement 
for the case that $ is concave. 

The proof of this fact will be given at the end of Section |S| It can be 
seen as the motivation for the following two sections, though they are also 
interesting on their own. 



dv 
da 
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5 The Gibbs, de Finetti Principle 

In this section we will give a pedagogical introduction to the paper of 
Fannes, Spohn and Verbeure |1U|. In fact, while they considered quantum 
spin system, which is more general, we specialize to the classical case. 
In order to be self-contained, we will review the specialization of their 
results. 

5.1 Setup 

In this section we relax the conditions on (j> relative to the previous section. 
We only assume the conditions from Section [5] Namely, we assume that 
cj> : fl" — > Ru{+oo} is Borel measurable and bounded below. We suppose 
that a® n ((f>) < oo and that 4> is invariant under the natural action of ©„. 

We will use two important principles, called the Gibbs variational 
formula, and de Finetti's theorem. The Gibbs formula gives a varia- 
tional formulation for the finite-volume approximations to the pressure, 
(p(N) : N > n). The de Finetti theorem is a representation theorem 
for all infinite exchangeable probability measures. When combined, these 
two principles give a mathematically rigorous variational formula for the 
thermodynamic pressure of a mean-field classical spin system, which the 
physicists also use (but usually without referring to the rigorous justifica- 
tion). 

We start by stating the Gibbs variational formula. The first step is 
to recall entropy. Given a measure p N 6 M^(Q, N ), its relative entropy 
with respect to a® N will be denoted as Sjv(p JV ). (Usually the relative 
entropy would be denoted Sn(p N ,a® N ), but we suppress a® .) This is 
a quantity in R U {— oo}. If p N is absolutely continuous with respect to 
a® N , then 

where 




-t log t iit£ (0, oo] , 
if t = . 



Even if p <C a , the relative entropy may equal -co depending on 
the Radon-Nikodym derivative. If p N is not absolutely continuous with 
respect to a® N (i.e., if the singular component has a positive mass) then 
Sn(p N ) is defined to be — oo. 

Henceforth we will call the quantity "relative entropy with respect to 
a® just by the term "entropy". The following important properties of 
the entropy, except for Property^ are proved in the monographs by Israel 
and Simon, respectively: )17| . Section II. 2, and |26| . Section III. 4. The 
best reference for Property^is the seminal paper by Ruelle and Robinson, 
|24|. One can also consult the monograph by Georgii 12 , Chapter 15 for 
related issues. 

As a notational point, for p N £ Mf(fl N ), and A C [l,iV], we denote 
by p N \ A, the measure in M*(fi) A \), naturally identified as the marginal 
of p N on the <r-subalgebra of Borel measurable functions on Q N depending 
only on coordinates of x for indices in A. 
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Proposition 8 (Properties of Relative Entropy). The functions, 
Sn : Mf(Q N ) -^KU {—00} (for N G N) have the following properties. 

1. (Definition through continuous partitions) 

rt 

S N (p N ) = inf inf V V (p N (u r )/a® N {u r )) a 0N (u r ) , 

r— 1 

where (ui, . . . ,ur) varies over all continuous partitions of unity on 
Q N , such that a® N (u r ) > for each r. 

2. (Non-positivity) S N (p N ) < for all p N 6 (Q w ) and equality 
holds for p N = a® N . 

3. (Upper semicontinuity) TTie function Sn : .M+(f2 ) — > R U 
{—00} 15 upper semicontinuous with respect to the topology of weak 
convergence. 

4. (Strict concavity) For pr,p$ € Mf(Q N ) and 9 G (0, 1), 

S N (9 ■ pf + (1 - 0) • pf ) > 6 S N ( P i) + (1-9) S N (p^) . 
The inequality is strict if S W(pi ) > — 00 for both i = 1,2, unless 

JV JV 
Pi = P2 • 

5. ("Almost convexity") For £/ie setting as above, 

s N (e- P ? + (i-o). p») < 9S N (p?) + (i-o)s N ( P ?) 

+ it>{e) + '4>{i-6). 

6. (Strong subadditivity) Given subsets A, B C [1, iV], 

S\aub\ (p N \AUB) + S ]AnBl (p N t A n B) 

< S lA} (p N \A)+S m (p N \B). 

For this to be consistent, we need to define 5*o. The need arises when 
one takes the mare inal (p N \ A n B) and A n B = 0. One can make 
sense of this by defining fi° = {0} to be the 1-point space, defining a® to 
be the unique measure in M + (Q?), and defining p N \ to be that same 
measure no matter what p N £ ) may be. Then the appropriate 

definition is obviously S (p JV f 0) = for all p^ G A^J(« W ). 

Using Property Q there is a stronger version of Property 01 Suppose 
that p = ^r=i ^ r f° r a " 1 i • • • > XR G an d 9\, . . . ,9r > are such 
that £f =1 r = 1. Then by iterating Propertyg] 

Sjv(p jv ) > ^«rS»(y. 

r=l 

This will be particularly useful in the thermodynamic limit, TV — » 00, 
when combined with Property|S] But we would like to generalize to allow 
continuous convex combinations (barycentric decompositions). 
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Lemma 9. Let (W, S) be a measure space with probability measure 8. 
Suppose that there is a measurable mapping (probability kernel) w £ W i— > 
p 1 ^, £ Define the barycenter p N such that 

P N (f) = f P*(f)de(w) 

Jw 

for each f G C(Q N ). Then, 

Sn(p N ) > ( S N { P Z)d6{w). (19) 
Jw 

Proof. Let (ui, . . . ,ur) be a continuous partition of unity on Q, , such 
that a® N (u r ) > for each r. By concavity, 



r=l 

^ f^^j^(pZ(ur)/a® N {u r )) deH) « 0iV K) 

> / SV(p£ )<$(«). 
JW 

Since this is true for every such partition, equation 1191 follows. ■ 
The Gibbs function on M^(fl N ) is defined as 

G N (p N ) = N" 1 [s N (p N )- P N (H N )\ . (20) 

(Let us reiterate that we have absorbed the inverse temperature /3 into 
the Hamiltonian.) The Gibbs measure is a measure 
such that pf < q 0JV and 

j AT 

P * r(x) = Z(A0 _1 exp(-H N {x)) . 



da® N 



Note that, since a® n (<^>) > we know that Z(N) > 0, by an elementary 
application of Jensen's inequality. An important formula for statistical 
mechanics is the following. 

Theorem 10 (Gibbs Variational Formula). The Gibbs function is 
strictly concave and upper semicontinuous ( on the set of measures where 
it is not equal to —oo). The maximum is attained at a unique point, which 
is the measure p, . Moreover, Gjv(p» ) = p(N). 

Proof. Note that 

G N (p N ) := S N (p N ;p?)+ P (N), 

where the first term on the right-hand-side is the relative entropy with 
respect to p@. All of the properties from Proposition |H| are also valid 
for relative entropy with respect to measures other than a® N mutatis 
mutandis. The theorem is just a collection of some of these. (The only 
thing that changes is the precise statement of strong subadditivity, which 
is not used in this theorem anyway.) ■ 



18 



Having stated the Gibbs formula, let us now state de Finetti's theorem. 
To set this up, we will need some notation. If p N £ Mf(£l N ) is symmetric 
under the natural action of Sjv on fl N , then it is called "exchangeable". 
In this case p N \ A clearly only depends on the cardinality, say R = \A\. 
As a notational simplification, when this is the case, we allow ourselves to 
write p ' in place of p N \ A. We will write the set of all exchangeable 
measures in Mf(Sl N ) as Mt(Q N , Sym). 

Definition: Given a strictly increasing sequence (N(k) £ N+ : k 6 N+) 
and a sequence of measures p N ^ k ' £ A4~t (Q N ^ , Sym), we will say that 
the sequence converges weakly if, for every N £ N+, it happens that the 
subsequence of marginals (p N ( k W N ■ k, N(k) > N) converges weakly in 

M+(n N ). 

Because of properties of the marginal, it will be clear that, if the se- 
quence of measures ^p N ^ : k £ N+^j converges weakly, then the weak 

limits p°°^ N := lrnifc^oo p N ( k ^ N are consistent with respect to taking fur- 
ther marginals. Therefore, the measures satisfy the hypotheses of Kol- 
mogorov's extension theorem. (C.f. [Sj, Theorem 12.1.2 or |29| . Exercise 
3.1.18.) So there is a naturally identified measure p x £ Mt (fi N ) , which 
is defined on the smallest er-algebra containing all cylinder sets (depend- 
ing on finitely many variables). Moreover p°° is defined just so that the 
finite-dimensional marginals are equal to p°° fJV , justifying the notation a 
posteriori. 

A measure p°° £ is called exchangeable if all of its finite 

marginals p°°^ N are exchangeable. Let .M^(f2 N ,Sym) be the set of ex- 
changeable measures in .M^(f2 N ). One may define a topology on the 
set of exchangeable measures such that a sequence of measures p^ £ 
.M^(fi N , Sym) converges iff p^°' converges (as k — > oo) for each N £ N+. 
This topology is metrizable and compact. Indeed it is the weak topology 
with respect to the compact metrizable topology on f2 N (c.f., Theo- 
rem IV. 5). The de Finetti theorem completely characterizes the measures 
in 7W + (n N ,Sym). 

Theorem 11 (de Finetti's Representation). For every measure p°° £ 
A4^(£l n , Sym), there is a unique p £ M^(A4^(Q,)), such that 

oofJV _ / (g)JV 



P 00 '™ = / M dp{p) , (21) 

for every N £ N+. 

For a general proof of this theorem, see the paper by Hewitt and 
Savage, |lfci|. For many connections to interesting results in probability 
theory, see the review of Aldous and references therein. 

5.2 Results 

The first result, analogous to Theorem |2] is the following, 
Theorem 12. For every Ni,N2 > n, 

(N 1 +N 2 ) P {N 1 +N 2 ) < tfip(JVO03) + iVap(JVa)03). (22) 
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Also, for each N > n 

P (N) > -a® n (J>). (23) 
In particular the sequence {p{N) : TV > n) converges in R. 

The second main result of this section is the following formula for the 
pressure. In Section [5] this will be compared to Corollary |S| in the convex 
case. 

Theorem 13 (Gibbs, de Finetti Variational Principle). Define g : 
Mt{Cl) -> RU {-00} by 

g{p) := Si(aO-*(m)- 

Then, for every TV >n, 

p(N) > supg(p), (24) 

where the supremum is taken over all p g Mf(Q,). The function g is 
upper semicontinuous, so the maximum is attained. Moreover, 

p, = max g(p) . (25) 

5.3 Proofs 

Proof of Theorem TTR Suppose p M+N e Mt(Q. M+N , Sym). By the 
definition of the sequence of (permutation invariant) Hamiltonians, 

(m + jv)- 1 ^"^) = hr 1 p M+N ^ M (H M ) 

= N- 1 p M+N]N (H N ). 

This implies that 

M+N , u x M+NtM/TT \ , M+N\N/tt \ 

p T (Hm+n) = p (H M ) + p (H N ) . 

By subadditivity of the entropy, which is Proposition |S] Property |S] spe- 
cialized to the case that A = [1, M] and B = [M + 1,N], one knows 

S M+N (p M+N ) < S M (p M+N ^) + S N (p M+NlN ). 

Therefore, 

(M + TV) Gm+n{ P M+N ) < MG M { P M+NlM ) + NG N (p M+NtN ) . (26) 

Now apply this to pf +N G Mt(V M+N , Sym) and use Theorem UHl The 
left-hand-side of becomes (M + N)p(M + TV), and the two terms on 
the right-hand-side are bounded above by M p(M) and TVp(TV). 

The bound 1231 1 is a variational lower bound obtained by the trial 
p N = a® N and Theorem I1UI From it we know that p(/3) > —00, which 
is important because subadditive sequences generally may have the limit 
—00, but this one does not. ■ 

Let us prove the easy part of Theorem 1131 which only uses Theorem 

rrui 
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Proof of Theorem ll31 Equation (112 II) . Suppose fi E M+,i(0.). Define 
p N = 0bserve that S N (p N ) =NSi(p) and 

1 P N (H N ) = -p N ^(H n ) = ^"(0) = 
jv n 

So G* JV ( / 5 JV ) =g(n). Then, using Theorem I1UI one obtains p(N) > <?(/x) 
as a variational lower bound. The equation follows. ■ 

To prove the second half of Theorem 1131 we will use the following 
important fact. So far we have only used subadditivity of the pressure, 
which is a special case of Theorem |H| Property |H| The next result uses 
strong subadditivity; in fact it is equivalent to it. 

Lemma 14. Suppose N g N+ and p e Mf(n N ,Sym). Then, 

n- 1 S n (p mn ) > N- 1 S N (p N ) , (27) 

for every n € [1, N]. 

Proof. It is sufficient to prove that, 

S N ( P N ) - Sn.^"- 1 ) < S/v-iC/^- 1 ) - S N - 2 (p N * N - 2 ) , (28) 

for every N > 1. This is because, by iterating this inequality, one gets 

S N (p N )-S N ^ 1 (p NtN - 1 ) < 5 n (p Arr ")-5 n - 1 (p Arr ' 1 - 1 ), 

for all n < N — 1. Summing these inequalities over n € [1, N — 1] gives a 
telescoping sum on the right-hand-side. So 

(JV - 1) S N (p N ) — (N — 1) Sh-iOT*"- 1 ) < Sjv-iG^" 1 ) - So( P NW ) ■ 

By rearranging terms, this would prove (1271 when n = N — 1 (recall that 
So(p N ^°) '■= 0). But then by iterating that, one could reach all n < N— 1. 

It remains to prove 1281 . Use Proposition |S] Property |SJ with A = 
[1,jV- 1] and B = [2,N]. ■ 

One of the most important consequences of de Finetti's theorem, for 
us, is the fact that the relative entropy becomes very simple in the N —* oo 
limit for exchangeable measures. In fact it is afline. This is expressed in 
the following lemma, which also uses Lemma [Til 

Lemma 15 (Mean Entropy). For every p°° G _M + (f2 N , Sym), the fol- 
lowing limit exists 

S (p°°) := lim N- 1 S N (p°° ]N ). 

N — 'oo 

The function s : .M^(f2 N , Sym) — * RU {— oo} is affine and upper semicon- 
tinuous. More precisely, 

S (P°°) = / S 1 (p)dp(p), 

where p € is the "directing measure" corresponding to p°° 

ma de Finetti's theorem. 
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Proof. The existence of the limit s(p°°) can be proved either by subaddi- 
tivity, (the specialization of Proposition^ Property|SJ , or by monotonicity 
of the entropy density as in Lemma ITU By the latter, it is clear that s is 
upper semicontinuous being the infimum of upper semicontinuous func- 
tions. Also, s is concave by Proposition |H] Property^] Moreover, one can 
deduce that s is convex by using Proposition |SJ Property |S] and noting 
that for the mean entropy one divides each Sn by N, and takes the limit 
as N — > oo (so that the error terms in "almost convexity" converge to 
uniformly). Therefore, s is affine. Using these properties and Lemma [5] 
one can prove that 

s (p°°) = / siS^dpin) . 
JM{(sn) 

(Actually, Lemma|5|only proves that the integral representation is a lower 
bound for s. But using convexity and upper semicontinuity, one can easily 
prove s(p°°) < max^gsuppjp) s(5 M ). By taking the correct partition, one 
can then use this to obtain the appropriate opposite inequality.) But 
when p — 5^, one has p°°^ N __ f or a \\ jy, anc j as a l re ady noted 

SnQ* 9 ") = NSiQt). So s(<5 M ) = SiQi). M 

Proof of Theorem [T31 Equation ^5$. Let (p* {k) : k € N+) be any 

weakly convergent subsequence of the Gibbs measures (which exists be- 
cause the set of all such sequences is compact with respect to the topology 
of weak convergence), and let pj° £ M^(Q , Sym) be the limit. 
Fix N > n. By Lemma ITU 

p(N(k)) = G N{k) (p^) < G N ( P ^ N ) , 

for all k such that N(k) > TV. Using Theorem I1UI Gjv is upper semicon- 
tinuous. Therefore, 

limsup p(/V(fc)) < G N (pT* N )- 

k — i-oo 

On the other hand, p(N) converges to p* by Theorem ll2l So 

P, < G N ( P T tN ). 

Since this inequality is true for every N > n, it is also true that 

p. < inf G N (p? m ). (29) 

iV£N + 

Define another affine, upper semicontinuous function Goo : A^(fi N ) — > 
R U {-oo} by 

G 00 (p°°) = s(p°°)-p°° tH W- 
Using Lemma |l4l one can conclude that Gjv(p°° riv ) is a decreasing se- 
quence, converging to G 00 (p oc ). Then, using this and I29L 

P* < Goc(pT) ■ 
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This is true for each limit point, and there is at least one. Therefore, 
p» < sup G 00 (p 00 ) . 

p=°£A1+(n N ,Sym) 

Since A^ : /"(ri N , Sym) is compact and convex, and Goo is a convex (in fact 
affine) and upper semicontinuous function, the maximum is achieved, and 
it is achieved at an extreme point. By Theorem 1111 the extreme points 
are of the form p°° — /i® N for some fi 6 M^(Q). In other words, the 
measure p £ Mt(Mf(Q)) defined via de Finetti's theorem is for some 
fj. £ Mi(Q.) if p°° is an extreme point of jM*(fi N ,Sym). In this case, 
one can explicitly calculate G 00 (p 00 ). It is g(p). This also proves that 
g is upper semicontinuous because it is the restriction of Goo, and that 
function is upper semicontinuous. ■ 

6 Minimax Theorem and a Joint Lagrangian 
6.1 Setup 

Recall that, under the hypothesis that ||0||oo < oo, 

P* = P* , 

by Corollary Q Therefore, there is a strong connection between the ex- 
tended variational principle and the Gibbs, de Finetti principle. We will 
make one more connection, by constructing a joint "Lagrangian". The 
joint Lagrangian we construct is the function 

CM = SiM-sM-s^W-iO. 

Since g is concave and upper semicontinuous no matter what the 
Hamiltonian, we see that C{-,v) is concave and upper semicontinuous 
for all v. Moreover, it is trivial to check that 

max Cia, v) = g(y) , 

using Theorem 1101 Similarly, using convexity of $ it is trivial to check 
that 

inf C(fx,v) = min C(n, v) = g(p) . 

The minimum is attained at /i = v. This is by inequality IllUt . In the 
concave case, the analogous inequality proves that 

sup C(p, v) = max C(fi, v) = g(p) . 

The main purpose of this section is to prove Theorem [7] For this pur- 
pose, we will use the following generalization of von Neumann's minimax 
theorem. We refer to |27| for an elegant (rather topological) proof. 

Theorem 16 (Kneser, Fan Minimax Theorem). LetJVl be a compact, 
convex space and let 3sf be any convex space. Suppose that £ is a function 
onMxN that is concave-convex. If £ is upper semicontinuous on M for 
each v £ N, then 

sup inf £(fi, v) = inf sup £(/i, v) . 
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Remark: The Kneser, Fan theorem generalizes the "von Neumann mini- 
max theorem" which is well-known as one of the first mathematical results 
in game theory. 

The definition of being concave-convex is that: for each v € N the 
function £(■, v) should be concave on M, and for each fi £ M the function 
C(n, •) should be convex on N. Note that in the n — 2 case, we can write 

c(ji,v) = ft^ + ^H-^W) 

which is convex in v as long as <£> is convex, because $^(-, fi) is linear. 
Therefore, in this case £(/-i, v) is concave-convex. Among other things, 
this means that g is convex. One requirement for applying Theorem Q 
is that the function C is assumed to map into R (instead of R U {±00}). 
This is the reason that we stated Theorem 1161 in the precise way we did. 



6.2 Proofs 

In order to prove Theorem Q we will need more information about the 
maximizer of g. Note that, since g is upper semicontinuous, and .Mf (f2) 
is a compact set, it does attain its maximum. If $ is convex, then g is 
also strictly concave simply because 

9 (a0 = SiO*) -*(/*), 

and Si is strictly concave. Therefore, the maximum is unique. In order to 
state the following lemma, let C4, be the finite constant = inf^go 4>( x )- 
Let C a = a® n [4>) < 00. Note that C a > C$. 

Lemma 17. Let /1* 6 M^(0) be the maximizer of g. Then fx <C a and 

^(x) = exp(a-* (1) (M* I «5,)) (30) 
for a-a.e. x £ ft. Here, C* is a finite constant related to fi» and by 

a = h»*)-p* = Si( M »)-2p». 

In particular, one has the bounds 

C a + Crt, < C» < 2C Q , 

so t/iat 




< exp(2[C* a -C,,,]) . 



Proof. Note that g is finite on a, so that g(fJ,,) > —00. In particular, 
this means that Si(fi») > —00. So ft, <C a. Suppose, in order to reach 
a contradiction, that supp(a) \ supp(/i») 7^ 0. Then there is a ball _B = 
B{x;r) C f2, r > 0, such that /tt*(B) = and a{B) > 0. Let 

v := ot{By x xb a , 

where xb is the indicator function of B. Let 

fie := (1 — e) • j«* + e • v . 



24 



A straightforward calculation shows that 

lime -1 [Si(/i e ) - Si(/i,)] = +00, 



e|0 



whereas 



lime" 1 [$(/*0 -*(/*.)] = S^V,"-**. 

ej.0 



is a finite number. Hence there is an e > small enough so that g(fJbe) > 
g(fj.*), contradicting the fact that /i* is a maximizer. 

Now let B = B(xo; r) C Q, for some xo £ supp(/i*) and r > 0. Let 

v := ^(B)' 1 \b M* • 

For t e R, let 

Ait = (1 — t) ■ fit + t ■ v . 

Note that for —fi t (B) < t < 1, one has that /it € .M+,i(n). It is easy to 
see that the following function is continuously differentiable, 

j(t) := g(fit) = SiQh) - . 

Moreover, the derivative at is 



7'(0) 



lOB 



da 



(,:) 



dv(x) - (ji, , v) - Si (At, ) + $ i±J 0« , p. ) 



By criticality, this must equal 0. So 



where 



is independent of a; and r. Note that since g(fit) = p*, this gives the 
previous formulas for C*. Note that 



loa 



rfo 



(*)) +$ (1) (A t *,5,)-a 



Since the total integral equals zero for all v , and xo and r are arbitrary, 
one concludes that 



loa 



dfi* 
da 



0. 



for almost every x £ supp(At*). But supp(At») = supp(a). Exponentiating 
this equation yields 13011 . ■ 

Proof of Theorem [7| Observe that, for any < C < 00, the subset 
Mf C) is compact and convex in Also, C(fi,v) is well- 

defined and finite for all /i, v £ M~±(Sl, a, C). Part of this statement 
is that <3?(V) and $ ( - 1 - t (1/, At) are finite. This is tantamount to the first 
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remark following the statement of Theorem |7| The other fact is that 
Si(aO is finite, because Si(m) > il>ifi°) > — oo. Therefore, the hypotheses 
of Theorem II til are satisfied, so that 

sup inf C(n, v) = inf sup C{pL, v ) > (31) 

when M = N = Mt(Cl,a,c). 

By inequality (tTUl . for any (jl £ .Mf (f2), 

inf £(//,!/) = • 

Moreover, the minimum is attained at y, = v. In particular, if fi g 
.M^(f2,«, C), then so is the minimizer i>. I.e., 

inf £(p, v) = g{n) 

for all (j£M. Therefore, 

sup g(fi) = inf sup C{n,v), (32) 

by ED. 

By Theorem ESI 

sup £0,1/) = <?(*/) , (33) 

MeAi+(n) 

for any z/ £ .Mf (fi), by viewing (y, ft) as a (^-dependent) Hamilto- 
nian integrated against /i. So, optimizing over the smaller set gives the 
inequality 

sup C(ji, v) < g(y) . 

Therefore, 

sup g(fi) < inf g(v) , (34) 

He.M$(n,a,B) u£MJ{Q.,<x,C) 

by EH . 

By Lemma tTTI the unrestricted optimizer of <?, over .M^(O) is which 
is in A^i"(n, a, C) for every C > 2(C a — C$). Moreover, g(n*) = p*. So, 
by El, 

p, < inf 

i/6Mi"(a,o,C) 

for every C > 2(C a — C^). In particular, 

lim inf g(u) > p* . (35) 

C^oo „£A4+(f2,ci,C) 

The proof will be completed by also establishing the opposite inequality. 
As noted, ft* is in Mf(£l,a, C) for C > 2(C a - C^). Therefore, 

lim inf g(v) < g(p») . (36) 
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By 03- 

g(ju) = $(^*)+log J exp(-$ (1) (At,;5s)) da{x) . 
But, by equation I3UL 

[ exp(-^ m (fu;S a )) da(x) = exp(-C.) / ^(x)da(x) = eacp(-C,) - 
Jn v I Jn da 

Therefore, 

= *(a*.)-C. 

But also by Lemma IT7I 

= p.. 

Therefore, combining with l|Mri[l . 

lim inf < p„ , 

as needed. ■ 

Remark: A posteriori it is clear that there is a saddle point for the 
Lagrangian C([i,v) at /i = u = fi„. However since L may not be strictly 
concave- convex, this may not be the only argminmax or argmaxmin. (C.f., 
}2<Hf . Chapter 11, Sections I and J, for the relevant notation from convex 
variational analysis.) If one could establish that g has an optimizer which 
can be identified by the Euler- Lagrange equations, then it must also be 
an optimizer for g because the Euler-Lagrange equations are the same. 
However, except in the case that </> is bounded and continuous, it is not 
clear that this is the case a priori. 

7 Example: The Negative Quadratic Ker- 
nel 

Let us consider Q C R d compact, and cj>(x, y) — — \\x— y\\ 2 . It is well-known 
that this defines a positive semidefinite form : Mo(Sl) x J\4 o(Q) — > R 
by the map $ (fi,u) = 2(/i (g) u)(4>), where Mo(Q) is the set of all 
bounded-variation, signed measures with total measure equal to 0. (In 
fact this is the critical homogeneous potential with this property. C.f., 
Schoenberg |25|.'l Therefore, $ is convex. We note that, for v G M.\{SX) 

\<f W (u,S x ) = - J \\x-yfdu(y) 

= -\\xf + 2 f (x,y)dv(y)- f \\y\\ 2 du(y) 
Jn Jn 

= -\\x-E[X]\\ 2 - Var(X), 
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where, X is a random variable which is i^-distributed. Using this, we also 
have 

= -Var(X)- / \\x - E[X]\\ 2 dv(x) 
J si 

= -2Var(X) . 

Therefore, 

g(v) = +log ^ cxp (-^'(i/.ftr)) da(z) 

= -2Var(X)+log / exp (2 Var(X) + 2\\x - E[X] f) da(x) 
J ft 

= log / exp(2||a;-E[X]|| 2 ) da(a) . 

In particular, this only depends on v through E"[X]. (We will write E"[X] 
when we want to specify that X is ^-distributed.) Given any xq £ fi, we 
can choose v = 5 XQ so that there is at least one v such that E"[X] = Xq. 
Therefore, the extended variational principle tells us that 

p» = min log / exp (2||rc — J/|| 2 ) da(x) . 
yen J n 

This is obviously a convex optimization problem, where the convex 
cost functional to be minimized is 

C(y) = log [ exp(2|| a ;-y|| 2 ) da(x) . 

Moreover, since O is compact and since the cost functional is continuous, 
there does exist a unique solution. Notice that the criticality condition is 
the implicit characterization: 

j n xe nx ~ v ^ da{x) 
V = J n e 2 " — y\\ 2 da(x) ' 

This example contains mean-field Ising and Heisenberg antiferromagnets 
as special cases. These are obtained by taking Q, — § d_1 , the spheres in 
K d . The Ising case is d = 1 for which we have §° = {— 1, +1}. We can 
include a one-body term, representing and external magnetic field, by a 
special choices of the a priori measure. We can also determine the Gibbs 
measure. It is equal to 

^(x) = z-V"*— 1,2 , 

da 

where y — solves the optimization problem above. 

If we change (f> to — <f>, we obtain the ferromagnetic version of these 
mean-field models. However, the analogous cost function becomes 

C(y) = log f exp (-2\\x - yf) da(x) . 
J si 

and we have p* = max ye nC(j/). Since the cost function is not concave, 
there can be multiple optimizers (depending on Q and a) which may be 
interpreted as the existence of a phase transition. 
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